Masters Thesis: Memory-based Modeling and Prioritized Sweeping in Reinforcement Learning

نویسنده

  • Thijs Ramakers
چکیده

Reinforcement Learning (RL) is a popular method in machine learning. In RL, an agent learns a policy by observing state-transitions and receiving feedback in the form of a reward signal. The learning problem can be solved by interaction with the system only, without prior knowledge of that system. However, real-time learning from interaction with the system only, leads to slow learning as every time-interval can only be used to observe a single state-transition. Learning can be accelerated by using a Dyna-style algorithm. This approach learns from interaction with the real system and a model of that system simultaneously. Our research investigates two aspects of this method: Building a model during learning and implementing this model into the learning algorithm. We use a memory-based modeling method called Local Linear Regression (LLR) to build a state-transition model during the learning process. It is expected that the quality of the model increases as the number of observed state-transitions increases. To assess the quality of the modeled state-transitions we introduce prediction intervals. We show that LLR is able to model various systems, including a complex humanoid robot. The LLR model was added to the learning algorithm to generate more state-transitions for the agent to learn from. We show that an increasing number of experiences leads to faster learning. We introduce Prioritized Sweeping (PS) and Look Ahead Dyna (LA Dyna) as possibilities to use the model more efficiently. We show how prediction intervals can be used to increase the performance of the various algorithms. The learning algorithms were compared using an inverted pendulum simulation, which had to learn a swing-up control task. Master of Science Thesis Thijs Ramakers

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Is prioritized sweeping the better episodic control?

Episodic control has been proposed as a third approach to reinforcement learning, besides model-free and model-based control, by analogy with the three types of human memory. i.e. episodic, procedural and semantic memory. But the theoretical properties of episodic control are not well investigated. Here I show that in deterministic tree Markov decision processes, episodic control is equivalent ...

متن کامل

Memory-Based Reinforcement Learning: Efficient Computation with Prioritized Sweeping

[email protected] NE43-771 MIT AI Lab. 545 Technology Square Cambridge MA 02139 We present a new algorithm, Prioritized Sweeping, for efficient prediction and control of stochastic Markov systems. Incremental learning methods such as Temporal Differencing and Q-Iearning have fast real time performance. Classical methods are slower, but more accurate, because they make full use of the observations....

متن کامل

Prioritized Sweeping Reinforcement Learning Based Routing for MANETs

In this paper, prioritized sweeping confidence based dual reinforcement learning based adaptive network routing is investigated. Shortest Path routing is always not suitable for any wireless mobile network as in high traffic conditions, shortest path will always select the shortest path which is in terms of number of hops, between source and destination thus generating more congestion. In prior...

متن کامل

Generalized Prioritized Sweeping

Prioritized sweeping is a model-based reinforcement learning method that attempts to focus an agent’s limited computational resources to achieve a good estimate of the value of environment states. To choose effectively where to spend a costly planning step, classic prioritized sweeping uses a simple heuristic to focus computation on the states that are likely to have the largest errors. In this...

متن کامل

Prioritized Sweeping: Reinforcement Learning with Less Data and Less Real Time

We present a new algorithm, Prioritized Sweeping, for e cient prediction and control of stochastic Markov systems. Incremental learning methods such as Temporal Di erencing and Qlearning have fast real time performance. Classical methods are slower, but more accurate, because they make full use of the observations. Prioritized Sweeping aims for the best of both worlds. It uses all previous expe...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010